Tom,
> Hm. Do they use query-cancels at all? The reference to async_notify
> makes me wonder if this is related to the recently-discovered
> async_notify bug that could prevent fast-mode shutdowns. I'm not
> certain how that might lead to an apparent deadlock, but a query cancel
> arriving during async_notify would surely improve the odds of trouble.
Not that I know of, unless it's for cleanup of queries when quitting the app or other such abort type states.
> If you don't mind running a slightly customized version, you might try
> back-patching this fix:
> http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/commands/async.c.diff?r1=1.91&r2=1.91.2.1
> into 7.2.4 and see if that improves matters.
I'll give that a shot.
> If it doesn't, I'd be interested to look into the matter, but I'd
> probably need access to the machine to see what is going on.
That's probably possible, but there are some client confidentiality issues.
> > Is there anything I can do to debug this? I'm willing to give it a
> > shot, but I'm also rapidly preparing a single proc linux/intel machine
> > to take over db duties.
>
> I think you're mistaken to be blaming the hardware...
The linux box is a migration that's being accelerated from this issue. It has more drive, more memory, no app servers,
andcontrol of the kernel shared memory parameters.
eric